Format

  • This is a workshop.
  • But it is short.
  • We will do some hands-on stuff but there is not a lot of time for practice.

Slides and Material

Who Am I?

Julia Haaf

Assistant Professor at the Psychological Methods Department, University of Amsterdam

email: j.m.haaf@uva.nl

Job alert: Two temporary assistant professor positions at the Psychological Methods Unit

Who Are You?



Overview

  1. How can we minimize mistakes in psychological science?
  2. What does it mean to have a fully reproducible pipeline?
  3. What is git and how can I use it?
  4. Using git for radically open data.

1. How can we minimize mistakes in psychological science?

Replicability Crisis

  • Failures to replicate (e.g. Ebersole et al., 2016; Open Science Collaboration, 2015; Wagenmakers et al., 2016).
  • Fraud (Bhattacharjee, 2013).
  • Improbable findings have been published in top-tier journals (e.g. Bem, 2011).



Proposed Solutions

  • Change the incentive structure (e.g., Nosek et al., 2015; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012).
  • Be transparent and open (e.g. Rouder, 2016; Wicherts, Bakker, & Molenaar, 2011).
  • Change the statistical approach (e.g. Benjamin et al., 2018; Erdfelder, 2010; Rouder et al., 2016)



Proposed Solutions

  • Change the incentive structure (e.g., Nosek et al., 2015; Wagenmakers et al., 2012).
  • Be transparent and open (e.g. Rouder, 2016; Wicherts et al., 2011).
  • Change the statistical approach (e.g. Benjamin et al., 2018; Erdfelder, 2010; Rouder et al., 2016)
We assume people do stuff on purpose.



Mistakes in Psychological Science

Sources of mistakes:

  • Errors when programming the experiment (e.g. randomization).
  • Equipment failure (e.g. responses are collected unreliably).
  • Lost data.
  • Errors when coding the analysis (e.g. with data cleaning).
  • Errors when reporting the analysis (e.g. typos).


Lab Practices Under the Microscope

Think about your own experience:

  • Is there time pressure to collect data?
  • Are there checks for coding experiments/surveys?
  • Are there checks for running analyses?


Consequences

  • Prevalence: Roughly half the publications in 30 years of literature contained at least one malformed statement of a statistical test (Nuijten, Hartgerink, Assen, Epskamp, & Wicherts, 2016).
  • Bias: Simple mistakes tend to go in researchers’ preferred direction (Gould, 1996).
  • Persistence: Once in the literature mistakes are almost impossible to detect (Rouder, Haaf, & Snyder, 2019).

High-Reliability Organizations

Principles for Avoiding Mistakes

  1. Sensitivity to operations: Focus on processes instead of outcomes.
  2. Preoccupation with failure: Look for ways to proactively anticipate and avoid mistakes, and take small mistakes seriously.
  3. Resilience in the face of failure and reluctance to simplify: In a resilient lab, when things go wrong — and they will — people talk about them, document them, and learn from them.
  4. Deference to expertise: Each lab member has certain expertise.


From Principles to Practices

  1. Adopting a lab culture focused on learning from mistakes.
  2. Implementing radical computer automation.
  3. Standardizing organizational strategies across lab members.
  4. Ensuring that statistical analyses are coded.
  5. Developing expanded manuscripts in which documentation of analyses is woven into the manuscript files.

2. What does it mean to have a fully reproducible pipeline?

Science vs. Pseudo-Science

“An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” Claerbout & Karrenbach, 1992

Fully Reproducible

  • Reproducible analysis.
  • Reproducible graphs and tables.
  • Reproducible numbers in text.

Who Can Reproduce When?

  • Ideal: Code containerization.
  • Minimal: Provide a list of packages and software needed (Open source!).
  • Utopian: “I will be able to fully reproduce my analysis by 2035.”

A Tools for Reproducibility

git

  • Versioning tool for collaboratively working on a product.
  • Avoid retaining multiple versions of the same work product.
  • ‘paper_final_final_B.docx’.
  • Tutorial: Vuorre & Curley (n.d.).

Short Break (5 minutes)

3. What is git and how can I use it?

What I would like to show you about git

  • How to use a terminal
  • Git
    • What is it good for?
    • What is it?
    • What can it do?
  • Set-up for your computer
    • GUI/terminal
    • R Studio & git
    • SSH
    • Set name & email address
  • Your first repo
    • Github and GitLab
    • In R Studio
    • gitignore
    • README
  • Workflow
    • Add, Commit, Push
    • Diff
    • Merge, Branches, Tagging… (all the cool stuff)
    • What happens if something goes wrong? (And it will.)

What we have time for

  • How to use a terminal
  • Git
    • What is it good for?
    • What is it?
    • What can it do?
  • Set-up for your computer
    • GUI/terminal
    • R Studio & git
    • SSH
    • Set name & email address
  • Your first repo
    • Github and GitLab
    • In R Studio
    • gitignore
    • README
  • Workflow
    • Add, Commit, Push
    • Diff
    • Merge, Branches, Tagging… (all the cool stuff)
    • What happens if something goes wrong? (And it will.)

What is it good for?

Version control:

  • Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
  • History of all changes (who, what, when).
  • Helps to avoid mistakes (working on the wrong version, deleting, …).
  • Merging changes of multiple collaborators in one file.

What is it?

What can it do?

  • A lot! Which is why we only need a part of its functionality.
  • Working on one product in (large) teams.
  • Working on things that can break.
  • git can only integrate and show changes in text files.
  • binary files (images, etc.) can be tracked and uploaded but changes cannot be shown.
  • Track changes for MS Word is improving.

Set-up for your computer

R Studio & git

R Studio & git

Tools ➤ Global Options ➤ Git/SVN.

Make sure the first box is ticked and the “git.exe” (Windows) is included in the first box.

Set name & email address

  • Open the Terminal in R Studio.
  • Set an email address and user name for git.

Set name & email address

  • Open the Terminal in R Studio.
  • Set an email address and user name for git.
git config --global user.email "myemail@email.com"
git config --global user.name "My commit name"

My first repo

Github

Github

New Repository

Github

New Repository

Github

New Repository

Github

Settings

Github

Settings

Github

Settings

Github

Clone It!

In R Studio

File ➤ New Project ➤ Version Control ➤ Git

In R Studio

  • You will have to type in you user name and password for github.
  • Initializes a local git repository with an R project (opening a clean R Studio session when opeing).
  • You can see the README file from github.
  • Adds a .gitignore file.


Task IV

  • Make a new repository.
  • Clone it using R Studio to make a local repository.



gitignore

  • Specifies intentionally untracked files to ignore.
  • Each line in a gitignore file specifies a pattern.
  • R Studio pre-specifies some useful patterns.
  • For R Markdown: Cache files! .tiff, .eps, .rdb, .rdx


README

README

  • Tell other people (and yourself in a year) why your project is useful, what they can do with your project, and how they can use it.
  • On github default README files are Markdown files!

Task V

  • Write a (short) README file for your test repository.
  • Use Markdown formatting.



Git Workflow

Do some work

Git Add


git add .gitignore myfirstrepo.Rproj



git can do autocomplete for file names!

Git Commit


git add .gitignore myfirstrepo.Rproj
git commit -am "My first commit"



Commits always have a commit message.

Commit message


Git Push


git add .gitignore myfirstrepo.Rproj
git commit -am "My first commit"
git push

Congrats! You have done it! Now local and remote repositories are up to date!

Task VI

  • Add your unstaged files.
  • Commit the changes.
  • Push to the remote repository.


Git Pull

Before you start working on the project the next time:


git pull


Pull, work some more, repeat.

What changed since the last commit?

git diff

What happens if something goes wrong? (And it will.)

What happens if something goes wrong? (And it will.)

  • Remember: You cannot break things.
  • Most likely you have a merge conflict.

Push Conflict

Merge Conflict

Merge Conflict

  • Resolve the conflict (Choose which changes to keep).
  • Commit, and Push.

Summary

  • Add, commit, push, pull.
  • Use it!
  • git documentation and error tracking are great!

Using git for radically open data.

Open Data

  • Standard: Data on request.
  • Wicherts, Borsboom, Kats, & Molenaar (2006): Only 11 % of APA journal authors complied with initial request, an additional 16 % with repeated requests. A full 73 % of author teams never complied.
  • More and more: Data on OSF.
  • Needs to be hand-curated, only published when the paper is published.

Born-open Data

  • Rouder (2016)
  • For people who are not meticulous.
  • Commitment to radically open data: You can see all data, even the failed pilots.
  • Nightly (or weekly) automatic upload to github.
  • github.com/PerceptionCognitionLab

Born-open Data

Advantages

  • Increases awareness to make good decisions.
  • No data-management mistakes (no data versions).
  • Automatic backup.
  • Easy data sharing.
  • Long-term availability.

Born-open Data

Elements

  • Shared local storage across experimental computers.
  • Git or Github repository.
  • Execution and scheduling

Born-open Data

Elements

  • Shared local storage across experimental computers.
  • Git or Github repository.
  • Execution and scheduling (task scheduler).
git add *.dat.*
git commit -m "automatic commit"
git push

Born-open Data

Concerns

  • Not suitable for very large files (neuroimaging data, etc.).
  • Not a properly curated archive.
  • Github repositories can be connected to OSF.
  • Privacy concerns.
  • Being scooped.
  • Being vulnerable.

References

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407–425. Retrieved from http://dx.doi.org/10.1037/a0021524

Benjamin, D. J., Berger, J., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6.

Bhattacharjee, Y. (2013). The mind of a con man. New York Times, April 26, 2013. Retrieved from http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all

Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016). Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. Retrieved from http://ezid.cdlib.org/id/doi:10.17605/OSF.IO/QGJM5

Erdfelder, E. (2010). A note on statistical analysis. Experimental Psychology, 57(1-4). Retrieved from 10.1027/1618-3169/a000001

Gould, S. J. (1996). The mismeasure of man. New York: WW Norton & Company.

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.

Nuijten, M. B., Hartgerink, C. H., Assen, M. A. van, Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6521), 943. Retrieved from dx.doi.org/10.1126/science.aac4716

Rouder, J. N. (2016). The what, why, and how of born-open data. Behavioral Research Methods, 48, 1062–1069. Retrieved from 10.3758/s13428-015-0630-z

Rouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing mistakes in psychological science. Advances in Methods and Practices in Psychological Science, 2(1), 3–11. Retrieved from https://doi.org/10.1177/2515245918801915

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is there a free lunch in inference? Topics in Cognitive Science, 8, 520–547.

Vuorre, M., & Curley, J. P. (n.d.). Curating research assets: A tutorial on the git version. Retrieved from https://psyarxiv.com/6tzh8

Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., R. B. Adams, J., … Zwaan, R. A. (2016). Registered replication report: Strack, martin, & stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. Retrieved from https://doi.org/10.1177/1745691616674458

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 627–633. Retrieved from https://doi.org/10.1177/1745691612463078

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE, 6(11), e26828. Retrieved from http://www.plosone.org/annotation/listThread.action?root=19627

Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. Retrieved from http://wicherts.socsci.uva.nl/datasharing.pdf